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Abstract 

We prove a new structural lemma for partial Boolean functions /, which we call the seed 
lemma for DNF. Using the lemma, we give the first subexponential algorithm for proper learn- 
ing of DNF in Angluin's Equivalence Query (EQ) model. The algorithm has time and query 
complexity 2^'^^\ which is optimal. We also give a new result on certificates for DNF-size, a 
simple algorithm for properly PAC-learning DNF, and new results on EQ-learning logn-term 
DNF and decision trees. 
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1 Introduction 



Over twenty years ago, Angiuin began study of the equivalence query (EQ) learning model OE]. 
Valiant [20] had asked whether DNF formulas were poly-time learnable in the PAC model; this 
question is still open. Angiuin asked the same question in the EQ model. Using approximate 
fingerprints, she proved that any proper algorithm for EQ-learning DNF formulas requires super- 
polynomial query complexity, and hence super-polynomial time. In a proper DNF learning algo- 
rithm, all hypotheses are DNF formulas. 

Angluin's work left open the problem of determining the exact complexity of EQ-learning DNF, 
both properly and improperly. Tarui and Tsukiji noted that Angluin's fingerprint proof can be 
modified to show that a proper EQ algorithm must have query complexity at least 2^^^^ |19j . 
(They did not give details, but we prove this explicitly as a consequence of a more general re- 
sult.) The most efficient improper algorithm for EQ-learning DNF is due to Klivans and Servedio 
(Corollary 12 of and runs in time 2'^^"^''^^ 

In this paper, we give the first subexponential algorithm for proper learning of DNF in the EQ 
model. Our algorithm has time and query complexity that, like the lower bound, is 2^^^\ 

Our EQ algorithm implies a new result on certificates for DNF size. Hellerstein et al. asked 
whether DNF has "poly-size certificates" [14J, that is, whether there are polynomials q and r such 
that for all s,n> 0, functions requiring DNF formulas of size greater than q{s,n) have certificates 
of size r(s, n) certifiying that they do not have DNF formulas of size at most s. (This is equivalent 
to asking whether DNF can be properly MEQ-learned within polynomial query complexity [14j.) 
Our result does not resolve this question, but it shows that there are analogous subexponential 
certificates. More specifically, it shows that there exists a function r{s,n) = 2'^(^" log s log n) g^(,]-^ 
that for all s,n> 0, functions requiring DNF formulas of size greater than r(s,n) have certificates 
of size r(s, n) certifying that they do not have DNF formulas of size at most s. 

Our EQ algorithm is based on a new structural lemma for partial Boolean functions /, which we 
call the seed lemma for DNF. It states that if / has at least one positive example and is consistent 
with a DNF of size s, then / has a projection fp, induced by fixing the values of 0{\Jn logs) 
variables, such that fp has at least one positive example, and is consistent with a monomial. 

We also use the seed lemma for DNF to obtain a new subexponential proper algorithm for 
PAC-learning DNFs which is simpler than the previous algorithm of Alekhnovich et al. [Ij, with 
the same bounds. That algorithm uses a procedure that runs multiple recursive calls in round 
robin fashion until one succeeds. In contrast, ours is an iterative procedure with a straightforward 
analysis. 

Decision-trees can be PAC and EQ-learned in time n^^^°^^\ where s is the size of the tree |12pi8]. 
We prove a seed lemma for decision trees as well, and use it to obtain an algorithm that learns 
decision trees using DNF hypotheses in time n*^''^°^'^i\ where si is the number of 1-leaves in the 
tree. (For any "minimal" tree, the number of 0-leaves is at most nsi; this bound is tight for the 
optimal tree computing a monomial of n variables.) 

We prove a lower bound result that quantifies the tradeoff between the number of queries needed 
to properly EQ-learn DNF formulas, and the size of such queries. One consequence is a lower bound 
query complexity necessary for an EQ algorithm to learn DNF formulas of size 
poly(n), using DNF hypotheses. This matches the lower bound of 2^*^^) mentioned by Tarui and 
Tsukuji. The bound for our EQ algorithm, applied to DNF formulas of size poly(n), differs from 
this lower bound by only a factor of logn in the exponent. 

We also prove a result on learning logn-term DNF using DNF hypotheses. Several poly-time 
algorithms are known for this problem in the membership and equivlence query (MEQ) model [9l 
l6l UH [15]. We prove that the membership queries are essential: there is no poly(n)-time algorithm 
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that learns 0(logn)-term DNF using DNF hypotheses, with equivalence queries alone. In contrast, 
Angluin and Kharitonov showed that, under cryptographic assumptions, membership queries do 
not help in PAC-learning unrestricted DNF formulas [5]. Blum and Singh gave an algorithm that 
PAC-learns logn-term DNF using DNF hypotheses of size n^^^°s^) in time ri*^^^"^"^ [7]; our results 
imply that no significant improvement of this result is possible for PAC-learning log n-term DNF 
using DNF hypotheses. 

2 Preliminaries 

Assignment x G {0, 1}" is a positive example of Boolean function f{xi, . . . ,Xn) if f{x) = 1, and a 
negative example if f{x) = 0. A sample of / is a set of pairs (x, f{x)), where x E {0, 1}". 

A literal is a variable or its negation. A term, also called a monomial, is a possibly empty 
conjunction (A) of literals. If the term is empty, all assignments satisfy it. The size of a term is 
the number of literals in it. We say that term t covers assignment x if t{x) = 1. It is an implicant 
of Boolean function /(xi, . . . ,Xn) if t{x) = 1 implies f{x) = 1. A DNF (disjunctive normal form) 
formula is either the constant 0, the constant 1, or a formula of the form ti V • • • V t^, where k > 1 
and each tj is a term. A /c-term DNF is a DNF formula consisting of at most k terms. A A;-DNF 
is a DNF formula where each term has size at most k. The size of a DNF formula is the number 
of its terms. 

A partial Boolean function / maps {0,1}" to {0,1,*}, where * means undefined. A Boolean 
formula (j) is consistent with a partial function / (and vice versa) if (p{x) = f{x) for all x G {0, 1}" 
where f{x) ^ *. If / is a partial function, then dnf-size{f) is the size of the smallest DNF formula 
consistent with /. 

Let Xn = {xi, . . . , Xn}- A projection of a (partial) function f{xi, . . . function induced 

from / by fixing k variables of / to constants in {0, 1}, where < A; < n. We consider the domain 
of the projection to be the set of assignments to the remaining n — k variables. If T is a subset 
of literals over Xn, or a term over X„, then fx denotes the projection of / induced by setting the 
literals in T to 1. 

For X S {0, 1}" we write |x| to denote Yli MaJ(xi, . . . , x„) to denote the majority function 

whose value is 1 if Y^^^i Xi > n/2 and otherwise. We write "log" to denote log base 2. 

A certificate that a property P holds for a Boolean function f{xi, . . . ,Xn) is a set A C {0, 1}" 
such that for all Boolean functions g{xi, . . . ,Xn), if g does not have property P, then /(a) / g{a) 
for some a G A. The size of certificate A is the number of assignments in it. 

We use standard models and definitions from computational learning theory. We omit these 
here; more information can be found in Appendix Rl 

We sometimes use the notation 0(), rather than 0{), to denote that we are suppressing factors 
that are logarithmic in the arguments to 0(). 

3 Seeds 

We introduce the following definition. 

Definition 1. A seed of a partial Boolean function /(xi, . . . (possibly empty) monomial 

T that covers at least one positive example of /, such that /t is consistent with a monomial. 

Our new structural lemma is as follows. 

Lemma 2. (Seed lemma for DNF) Let f he a partial Boolean function such that f{a) = 1 for 
some a G {0, 1}". Let s = dnf-size{f ). Then f has a seed of size at most 2Vn Ins. 
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Proof. Let <f) he a. DNF formula of size s = dnf-size{f) that is consistent with /. 11 cf) = 1, then 
is a seed. Suppose cp ^ 1. Then since /(a) = 1, has at least one term. Since (p has size 
s = dnf-size{f), it is of minimum size, each term of (p covers at least one positive example of /. 
We construct seed T from (j) by initializing two sets Q and R to be empty, and then repeating the 
following steps until a seed is output: 

1. If there is a term P of <p of size at most \/n In s, output the conjunction of the literals in 
Q\^P' as a seed, where P' is the set of literals in P. 

2. If all terms of (f> have size greater than \/nlns, check whether there is a literal I ^ QU R that 
is satisfied by all positive examples of fg. 

(a) If so, add I to R. Set / to 1 in </» by removing all occurences of / in the terms of (p. (There 
are no occurences of / in </>.) 

(b) If not, let I be the literal appearing in the largest number of terms of (/). Add / to Q. Set 
Z to in (/> by removing from cp all terms containing /, and removing all occurences of / in 
the remaining terms. Also remove any terms which no longer cover a positive example 
of fquR- 

We now prove that the above procedure outputs a seed satisfying the properties of the lemma. 
During execution of Step 2a, no terms are deleted. At the start of execution of Step 2b, there 
is a positive example of /qu_r that does not satisfy /, and hence a term t of <p that does not 
contain the updates made to (p in Step 2b do not delete t. Thus the following three invariants are 
maintained by the procedure: (1) (p contains at least one (possibly empty) term, and each term of 
(j) covers at least one positive example of Jqijr (2) (p is consistent with fqyjR and (3) each term 
of (j) covers at least one positive example of fqyjR- 

Literals are only added to R in Step 2a, when there is a literal / satisfied by all positive examples 
of /q. Thus another invariant holds: (4) for any positive example a of /, if a satisfies all literals in 
Q, then a satisfies all literals in R. 

Since each loop iteration removes a variable from (p, there are at most n iterations. By the 
invariants, when T is output, (p is consistent with fQ\jji, and term P of <p is satisfied by at least 
one positive example of fqyjR- Thus fQ\jp' has at least one positive example. Further, since P is 
a term of (p, and cp is consistent with /qijj?, if an assignment a satisfies Q\JP'\JR then /(a) = 1 
or /(a) = *. Thus fqyjp' is consistent with the monomial and Q\JP' is a seed. 

Clearly P has at most \/n In s literals. We use a standard technique to bound the size of Q (cf. 
[3]). Each time a literal is added to Q, all terms of (p have size at least Vnlns, and thus the literal 
appearing in the most terms of (p appears in at least as terms, for a = ya^^i^)/'^- So each time a 
literal is added to Q, at least as terms are removed from (p. When Q contains r literals, (p contains 
at most (1 — aYs terms. For r > \/n\ns, (1 — aYs < e~°^^'^s = 1. Since (p always contains at least 
one term, Q contains at most \/nlns literals. Thus T has size at most 2\/ n In s. □ 

The above bound on seed size is nearly tight for a monotone DNF formula on n variables having 
^/n disjoint terms, each of size y/n. The smallest seed for the function it represents has size ^/n — 1. 

4 PAC-learning DNF (and decision trees) using seeds 

We begin by presenting our algorithm for PAC-learning DNFs. It is simpler than our EQ algorithm, 
and the ideas used here are helpful in understanding that algorithm. We present only the portion 
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of the PAC algorithm that constructs the hypothesis from an input sample S, and we assume that 
the size s of the target DNF formula is known. The rest of the algorithm description is routine (see 
e.g. [1]). Let 5+ and S~ denote the positive and negative examples in S, and let denote the 
partial Boolean function that is defined consistently with all assignments in S, and is undefined on 
all assignments not in S. We describe the algorithm here and give the pseudocode in Appendix iBl 

The algorithm begins with a hypothesis DNF h that is initialized to 0. It finds terms one by 
one and adds them to h. Each additional term covers at least one uncovered example in S~^, and 
terms are added to h until all examples in are covered. 

The procedure for finding a term is as follows. First, the algorithm tests each conjunctions T 
of size at most 2^"^"^^ to determine whether it is a seed of . To perform this test, the algorithm 
explicitly checks whether T covers at least one positive example in S; if not, T is not a seed. It 
then checks whether fj, is consistent with a monomial, using the same approach as the standard 
PAC algorithm for learning monomials [20], as follows. Let St be the set of positive examples in S 
that satisfy T. The algorithm computes term T', which is the conjunction of the literals that are 
satisfied by all examples in St (so T' includes T). It is easy to show that fj, is consistent with a 
monomial iff all negative examples of S falsify T' . So, the algorithm checks whether all negative 
examples in S falsify T' . If so, T is a seed, else it is not. 

By the seed lemma for DNF, at least one seed T will be found. For each seed T found, the 
associated term T' is added to h, and the positive examples satisfying T' are removed from S. If S 
still contains a positive example, the procedure is repeated with the new S. 

The correctness of the algorithm follows immediately from the above discussion. Once a seed 
T is found, all positive examples in S that satisfy T are removed S, and thus the same seed will 
never be found twice. Thus the algorithm runs in time 2<^(Vri logs log n) outputs a DNF formula 
of that size. 

We can generalize the technique used in the above algorithm. Say that an algorithm uses the 
seed covering method if it builds a hypothesis DNF from an input sample S by repeatedly executing 
the following steps, until no positive examples remain in the sample: (1) find a seed T of partial 
function f^, (2) form a term T' from the positive examples in S that satisfy T, by taking the 
conjunction of the literals satisfied by all those examples, (3) add term T' to the hypothesis DNF 
and remove from S all positive examples covered by T' . 

In fact, the algorithm of Blum and Singh, which PAC-learns A;-term DNF, implicitly uses the 
seed covering method. It first finds seeds of size k — 1, then seeds of size k — 2, and so forth. It differs 
from our DNF-learning algorithm in that it only searches for a restricted type of seed. Our seeds 
are constructed from two types of literals, those (in Q) that eliminate terms from the target, and 
those (in P) that satisfy a term. Their algorithm only searches for seeds containing the first type 
of literal. Algorithmically, their algorithm works by identifying subsets of examples satisfying the 
same subset of terms of the target, while ours works by identifying subsets of examples satisfying 
a common term of the target. 

We conclude this section by observing that the seed method can also be used to learn decision 
trees in time n'-'(^°S'^i) ^ where si is the number of 1-leaves in the decision tree. This follows easily 
from the following lemma0 

Lemma 3. (Seed lemma for lecision trees) Let f be a partial Boolean function, such that f has at 
least one positive example, and f is consistent with a decision tree having si leaves that are labeled 
1. Then f has a seed of size at most logsi. 

^We note that an alternative approach to proving the seed lemma for DNF is to use Bshouty's result that states 
that every DNF of size s has a decision tree of size 2^^^^^ with 0{y/n)-DNF formulas in the leaves [8], and then to 
modify our proof of the seed lemma for decision trees to accomodate DNFs in the leaves. 
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Proof. Let J be a decision tree consistent with /, and let si be the number of its leaves that are 
labeled 1. Without loss of generality, assume that each 1-leaf of J is reached by at least one positive 
example of /. Define an internal node of J to be a key node if neither of its children is a leaf labeled 
0. Define the key-depth of a leaf to be the number of key nodes on the path from the root down to 
it. It is not hard to show that since J has si leaves labeled 1, it must have a 1-leaf with key-depth 
at most logsi. Let p be the path from the root to this 1-leaf. Let L be the set of literals that are 
satisfied along path p. Let Q be the conjunction of literals in L that come from key nodes, and let 
R be the conjunction of the remaining literals. Consider an example x that satisfies Q. Consider 
its path in J. If x also satisfies R, it will end in the 1-leaf at the end of p, else it will diverge from 
p at a non-key node, ending at at the 0-child of that node. Thus fq is consistent with monomial 
i?, Q is a seed of /, and \Q\ < logn. □ 



5 EQ-learning DNF using seeds 

We now present our algorithm for EQ-learning DNF. It can be viewed as learning a decision list 
with monomials of bounded size in the nodes, and (implicant) monomials of unbounded size in 
the leaves (and a default); we use a variant of the approach used to EQ-learn decision lists with 
bounded-size monomials in the nodes, and constant leaves \16\ I18j. Like our PAC algorithm, our 
EQ algorithm could be generalized to learn other classes with seeds. 

Let <j) be the target DNF, and let s be the size of (p. Let / be the function represented by (j). 
Let X = {xi, . . . , Xn},X = {xi, . . . , Xn}- Let Q = {t C X U X \ \t\ < 2\/n Ins}. Q is the set of 
potential seeds. 

We first introduce the main ideas of the algorithm. Define a sequence of partial functions as 
follows. Let /^^^ = /. For 1 < z < \Q\, let /^*^ be the partial function that is identical to Z*-*"^^ 
except on positive assignments a of /^*~^^ that are covered by a seed of /^*~^^. The value of f^^^ 
on those assignments is *. By the seed lemma for DNF, every positive example of / is covered by 
a seed of some in this sequence. 

For each f^'^\ the algorithm keeps a set of candidate seeds T from Q. With each such T the 
algorithm keeps a term T' (which includes the literals in T); it stores the {T,T') pairs in a set Hi. 

The algorithm constructs a hypothesis DNF formula made up of the terms T' from the pairs 
(T, r') in the Hi. Intuitively, the goal is to have each Hi contain only pairs (T, T') for actual seeds T 
of /(*\ and for T' to be the conjunction of T and a monomial consistent with f!jp\ Counterexamples 
are used to modify the Hi to get closer to this goal. 



We present the details in the pseudocode in Algorithm 1 on the following page Note that the 
T' are initialized to contain all literals, and thus have no satisfying assignments. The condition 
T' ^ means that T' does not contain a variable and its negation. 

We now prove correctness. It is easy to see that each hypothesis h is consistent with all positive 
counterexamples received so far. For term T, let Ax^i = {e G {0, l}"|r(e) = 1 and /^*^(e) = 1}, and 
let MT,i = {I ^ Xn\jXn\l is satified by all e S ^t,«}- We prove that the following invariant holds: 
For each Hi, if T is a seed of f^^\ then Hi contains a pair {T,T') where T' contains all literals in 
MT,i and T. The invariant holds initially. Assume it holds before processing of a counterexample 
e. If e is a positive counterexample, then each resulting update modifies a T' , where {T,T') £ Hj 
for some j. and e satisfies T. Suppose T is a seed of f^^\ Let i be the minimum value such that 
e is covered by a seed of f^^\ By the invariant j < i and e is a positive example of f^^\ Hence 
e G At^j and satisfies all literals in M^^j, so the invariant holds after the update. 

Now suppose e is a negative counterexample. If e satisfies T such that {T,T') G Hj, and T is 
a seed of /'••'•*, then /i;'^ is consistent with a monomial, so every negative example of / must falsify 
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Initialize h = 0. Ask an equivalence query with h. If answer is yes return h, 


else let e be 


the 


counterexample received. 






for ah 1 < ?• < IQI, Hi = UT,T') 1 T G Q,T' = A,^^, ,^1} 






while True do 






if e does not satisfy h then //e is a positive counterexample 






for J = 1 to IQI do 






if e satisfies T for some (T,T'^ G Hj then 






for all T such that (T, T') E and e satisfies T do 






remove from T' all literals falsified by e 






end for 






break out of for j = 1 to \Q\ loop 






end if 






end for 






else //e is a negative counterexample 






for j = 1 to \Q\ do 






Remove from Hi all (T,T') such that T' is satisfied by e 






end for 






end if 






H* = {T' : for some j, (T, T') G and T' ^ } 






h = Vt'sh* 






Ask an equivalence query with hypothesis h. If answer is yes, return h, 


else let e be 


the 


counterexample received. 






end while 







Algorithm 1: EQ Algorithm 



T or some literal in Mtj- Therefore, by the invariant, e falsifies T' . Thus in processing e, a pair 
(T,T') is removed from Hj only if T is not a seed of so again the invariant is maintained. 

Since each negative counterexample eliminates a pair (T, T') from some Hj, the number of 
negative counterexamples is 2*^(^"^°ssiog"). Since each positive counterexample eliminates at least 
one literal from T', in some (T, T'), and h is always satisfied by the positive counterexamples, the 
number of positive counterexamples is 2*^^^" log slog n)_ xhus the algorithm will output a correct 
hypothesis in time 2'^(^"'^'>s^iogn) _ 

We have proved the following theorem. 

Theorem 4. There is an algorithm that EQ-learns DNF properly in time 2*^ (v'" log « log") _ 

Our algorithm can be viewed as an MEQ algorithm that does not make membership queries. 
The results of Hellerstein et al. [15] relating certificates and query complexity imply the following 
corollary. We also present a direct proof, based on the seed lemma for DNF, in Appendix [Cl 

Corollary 5. There exists a function r{s,n) = 2'^(v^"'°§'^'°§") such that for all s,n > 0, for all 
Boolean functions /(xi, . . . , x„), if dnf-size{f) > r(s,n), then f has a certificate of size at most 
r{s,n) certifying that ds{f) > s. 
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6 A tradeoff between number of queries and size of queries for 
properly learning DNF 



In this section we give a careful quantitative sharpening of Angluin's approximate fingerprint proof, 
which showed that DNF cannot be properly EQ-Iearned with polynomial query complexity [3] . We 
thereby prove a tradeoff between the number of queries and the size of queries that a proper EQ 
algorithm must use. Suppose that A is any proper EQ algorithm for learning DNF . We show that 
if A does not use hypotheses with many terms, then A must make many queries. Our result is the 
following (no effort has been made to optimize constants): 

Theorem 6. Let 17 < k < y^n/{2 log n). Let A he any EQ algorithm which learns the class of all 
poly{n)-size DNF formulas using queries which are DNF formulas with at most 2^^^ terms. Then 
A must make at least queries in the worst case. 

Taking k = G(v^n/log n) in Theorem [6l we see that any algorithm that learns poly(n)-term 
DNF using 2^"^°s "-term DNF hypotheses must make at least 2^(^"^°s") queries. 

We use the following lemma, which is a quantitative sharpening of Lemma 5 of [3]. The proof 
is in Appendix ID. 1[ 

Lemma 7. Let f be any T-term DNF formula over n variables where T > 1. For any r > 1, either 
there is a positive assignment y £ {0, 1}" (i.e. f{y) = 1) such that \y\ < r^/n, or there is a negative 
assignment z € {0, 1}" (i.e. f{z) = 0) such that n > \z\ > n — {^/nlllT)/r — 1. 

Proof of Theorem [6} As in [3] we define M{n, t, s) to be the class of all monotone DNF formulas 
over variables xi, . . . ,Xn with exactly t distinct terms, each containing exactly s distinct variables. 
Let M denote {^f}), the number of formulas in M{n,t,s). 

For the rest of the proof we fix t = n^"^ and s = 2k log n. We will show that for these settings 
of s and t the following holds: given any DNF formula / with at most 2"/'^ terms, there is some 
assignment a-^ G {0, 1}" such that at most M/n^ of the M DNFs in M{n, t, s) agree with f on . 
This implies that any EQ algorithm using hypotheses that are DNF formulas with at most 2"/*^ 
terms must have query complexity at least in the worst case (By answering each equivalence 
query / with the counterexample as described above, an adversary can cause each equivalence 
query to eliminate at most M/n^ of the M target functions in M{n, s, t). Thus after n*^ — 1 queries 
there must be at least M/n^ > 1 possible target functions in M{n,t,s) that are still consistent 
with all queries and responses so far, so the algorithm cannot be done.) 

Recall that 17 < k < ^/nJ(2Aognj. Let / be any DNF with at most 2*^/^ terms. Applying 
Lemmadwith r = \/n/2, we get that either there is a positive assignment y for / with \y\ < r^Jn = 
n/2, or there is a negative assignment z with n > \z\ > n — (-^/n ln(2"/'')) /r — 1 = n — (^^'^^)" — 1 > 
n — Let i;^ be a DNF formula randomly and uniformly selected from M{n, t, s). All probabilities 
below refer to this draw of (p from M(n, t, s). 

We first suppose that there is a positive assignment y for / with |y| < n/2. In this case the 
probability (over the random choice of (j)) that any fixed term of (p (an AND of s randomly chosen 

variables) is satisfied by y is exactly < < A union bound gives that Fr fj,[cl){y) = 1] < 

t/2*. Thus in this case, at most a t/2^ fraction of formulas in M{n, t, s) agree with / on y. Recalling 
that t = n^'^, s = 2klogn and k > 17, we get that t/2'^ < 1/n^ as was to be shown. 

Next we suppose that there is a negative assignment z for / such that n > \z\ > n(l — |). At 
this point we recall the following fact from [3j: 
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Fact 8 (Lemma 4 of [3]). Let (phe a, DNF formula chosen uniformly at random from AI{n, t, s). Let 



z be an assignment which is such that t < (") - Then FT^[(t){z) = 0] < (1 - ((|2;| - s)/n)'')* 



Since t = n^'^ , < n — 1, and s = 0{^/nTogn), we indeed have that t < (") — ('^') as required 
by the above fact. We thus have 



Recalling that k < y^n/(2 log n) we have that s/n = 2klogn/n < l/k, and thus 

p*(.)=oi<(i-(i-iyy=(i-(i-i)"'"y . 

Using the simple bound (1 — > 1/4 for x > 2, we get that (l — ^'^^'^log" y l/n^^. Thus we have 

PrW.) = 01<(l-^)""<e-"«J, 
as was to be shown. This concludes the proof of Theorem [6j □ 



7 Achieving this tradeoff between number of queries and query 
size for properly learning DNF 

In this section we prove a theorem showing that the tradeoff between number of queries and query 
size established in the previous section is essentially tight. Note that the algorithm A described in 
the proof of the theorem is not computationally efficient. 

Theorem 9. Let 1 < k < j^^^ and fix any constant d > 0. There is an algorithm A which learns the 
class of all n'^-term DNF formulas using at most 0{n^^'^^^) DNF hypothesis equivalence queries, 
each of which is an 

Following [To], the idea of the proof is to have each equivalence query be designed so as to 
eliminate at least a 5 fraction of the remaining concepts in the class. It is easy to see that 0(log(|C|)- 
such equivalence queries suffice to learn a concept class C of size \C\. Thus the main challenge 
is to show that there is always a DNF hypothesis having "not too many" terms which is guaranteed 
to eliminate many of the remaining concepts. This is done by taking a majority vote over randomly 
chosen DNF hypotheses in the class, and then showing that this majority vote of DNFs can itself 
be expressed DNF with "not too many" terms. 

Proof of Theorem [9l 

At any point in the execution of the algorithm, let CON denote the set of all n'^-term DNF 
formulas that are consistent with all counterexamples that have been received thus far (so CON is 
the "version space" of n'^-term DNF formulas that could still be the target concept given what the 
algorithm has seen so far). 

^The statement of Lemma 4 of U stipulates that t < n but it is easy to verify from the proof that f < (") — ('^') 
is all that is required. 
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A simple counting argument gives that there are at most 3" DNF formulas of length at most 
nf^. We describe an algorithm A which makes only equivalence queries which are DNF formulas 
with at most rJ' terms and, with each equivalence query, multiplies the size of CON by a factor 
which is at most (l — . After 0{n^~^'^~^^) such queries the algorithm will have caused CON to 
be of size at most 1, which means that it has succeeded in exactly learning the target concept. 

We first set the stage before describing the algorithm. Fix any point in the algorithm's execution 
and let CON = {/i, . . . , /tv} be the set of all consistent n'^-term DNF as described above. Given 
an assignment a G {0, 1}" and a label h E {0, 1}, let Na^b denote the number of functions fi in CON 
such that /(o) = h (so for any o we have A'a.o + -^a,i = N), and let Na^min denote minjA'^a^O) ^a,i}- 

Let Z denote the set of those assignments a G {0, 1}" such that Na^min 

so an assignment 

is in Z if the overwhelming majority of functions in CON (at least a 1 — \ fraction) all give the 
same output on the assignment. We use the following claim, whose proof is in Appendix ID. 2[ 

Claim 10. There is a list oft = -^^^ functions fi^, ■ ■ ■ , fit G CON which is such that the function 
Maj(/ii, • • • ) fit) agrees with Maj(/i, . . . , /iv) on all assignments a £ Z. 

By Claim [TO] there must exist some function hcoN = Maj(/i^, . . . , /jj, where each fi^ is an n'^- 
term DNF, which agrees with Maj(/i, . . . , /at) on all assignments a € Z. The function Maj(?;i, . . . ,vt) 
over Boolean variables vi, . . . ,vt can be represented as a monotone t-DNF with at most 2* terms. If 
we substitute the n'^-term DNF fi. for variable vj , the result is a depth-4 formula with an OR gate 
at the top of fanin at most 2*, AND gates at the next level each of fanin at most t, OR gates at the 
third level each of fanin at most n'^, and AND gates at the bottom level. By distributing to "swap" 
the second and third levels of the formula from AND-of-OR to OR-of-AND and then collapsing the 
top two levels of adjacent OR gates and the bottom two levels of adjacent AND gates, we get that 
hcoN is expressible as a DNF with 2* • n'^* = 2*^^"/'^^ terms. 

Now we can describe the algorithm A in a very simple way: at each point in its execution, when 
CON is the set of all n'^-term DNF consistent with all examples received so far as described above, 
the algorithm A uses the hypothesis hcoN described above as its equivalence query. To analyze 
the algorithm we consider two mutually exclusive possibilities for the counterexample o which is 
given in response to hcoN- 

Case 1: a G Z. In this case, since h{a) agrees with the majority of the values /i(a), . . . , fN{o), 
such a counterexample causes the size of CON to be multiplied by a number which is at most 1/2. 

Case 2: a ^ Z. In this case we have Nafl,Na^i > ^ so the counterexample a must cause the 
size of CON to be multiplied by a number which is at most (l — . This proves Theorem O □ 



8 Membership queries provably help for learning logn-term DNF 

The following is a sharpening of the arguments from Section [6] to apply to log(n)-term DNF. 

Theorem 11. Let A be any algorithm which learns the class of all logn-term DNF formulas using 
only equivalence queries which are DNF formulas with at most n^°^" terms. Then A must make at 
least n^^°sn-)/3 equivalence queries in the worst case. 

Sketch of Proof of Theorem lilt As in the proof of Theorem [6] we consider M(n, t, s), the class 
of all monotone DNF over n variables with exactly t distinct terms each of length exactly s. For this 
proof we fix s and t both to be log n. We will show that given any DNF formula with at most n'°s" 
terms, there is an assignment such that at most a l/n('°§")/^ fraction of the DNFs in M{n,t,s) 
agree with / on that assignment; this implies the theorem by the arguments of Theorem [6l Details 
are in Appendix ID. 3 1 □ 
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Appendices 

A Learning models 

In this appendix, we define the learning models used in this paper. We present the models here 
only as they apply to learning DNF formulas. See e.g. [4] for additional information and more 
general definitions of the models. 

In the PAC learning model [20], a DNF learning algorithm is given as input parameters e and 
6. It is also given access to an oracle EX{c,T>), for a target DNF formula c defined on Xn and a 
probability distribution T) over {0, 1}". On request, the oracle produces a labeled example {x,c{x)), 
where x is randomly generated with respect to D. An algorithm A PAC-learns DNF if for any DNF 
formula c on Xn, any distribution D on {0, 1}", and any < €,6 < 1, the following holds: Given 
e and 5, and access to oracle EX{c,T)), with probability at least \ — 5, A outputs a hypothesis h 
such that Pr^g2?[/i(x) 7^ c(x)] < e. Algorithm ^4 is a proper DNF-learning algorithm if /i is a DNF 
formula. 

In the EQ model [2j, a DNF learning algorithm is given access to an oracle that answers 
equivalence queries for a target DNF formula c defined on X„. An equivalence query asks "Is h 
equivalent to target c?", where his a hypothesis. If h represents the same function as c, the answer 
is "yes," otherwise, the answer is a counterexample x G {0, 1}" such that h{x) 7^ c{x). If c{x) = 1, 
is a positive counterexample else it is a negative counterexample. Algorithm A EQ-learns DNF 
if, for 71 > and any DNF formula c defined on X„, the following holds: if A is given access to an 
oracle answering equivalence queries for c, then A outputs a hypothesis h representing exactly the 
same function as c. Algorithm A EQ-learns DNF properly if all hypotheses used (in equivalence 
queries, and in the output) are DNF formulas. 

A PAC or EQ learning algorithm learns k-term DNF if it satisfies the relevant requirements 
above when the target is restricted to be a A;-term DNF formula. 

In variants of the PAC and EQ models, the learning algorithm can ask membership queries 
which ask "What is c{x)?" for target c and assignment x. The answer is the value of c(x). 

A PAC algorithm for learning DNF is said to run in time t = t{n, s, e, 5) if it takes at most t 
time steps, and its output hypothesis can be evaluated on on any point in its domain in time t, 
when the target is over {0, 1}" and has size s. The time complexity for EQ algorithms is defined 
analogously for t = t{n, s). 

The query complexity of an EQ learning algorithm is the sum of the sizes of all hypotheses used. 

B Pseudocode for PAC algorithm 

Pseudocode for the PAC algorithm of Section HI 
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X = {xi, . . . ,Xn},X = {xi, . . . ,Xn} 




Q = {t<oXLlX \ \t\ < 2\/nln / /set of potential seeds 




h = 




while Q / AND 5+ / do 




for all t £ Q do 






a seed of /"^ 


if T covers at least one e G then //test T to see if it is 


St = {e \ ee AND T covers e } 




T' = I where B = {l(zXUX\xis satisified by all 


e G St}. 


if {e e e 5" AND e satisfies T'} = then 




S+ = S+\ St 




h = hVT' 




Remove t from Q 




end if 




end if 




end for 




end while 




if 5+ / then 




return fail 




else 




return h 




end if 





Algorithm 2: PAC algorithm 



C Sub exponential certificates for functions of more than subex- 
ponential DNF size 

We present a direct proof of Corollary [5l based on the seed lemma for DNF. 

Proof. Let s, n > 0. Let n) = 2^Jn\og s. Let / be a function on n variables such that 
dnf-size{f) > n''^'*'"^ We first claim that there exists a partial function /', created by removing 
a subset of the positive examples from / and setting them to be undefined, that does not have a 
seed of size at most q{s,n). Suppose for contradiction that all such partial functions /' have such 
a seed. Let S be the sample consisting of all 2" labeled examples of /. We can apply 

the seed covering method of Section [4] to produce a DNF consistent with /, using a seed of size 
q{s, n) at every stage. Since no seed will be used more than once, the output DNF is bounded 
by the number of terms of size at most q{s,n), which is less than n'^(^'^\ This contradicts that 
dnf-size{f) > n'^^^'^K Thus the claim holds, and /' exists. 

Since /' does not have a seed of size at most q{s,n), each term T of size at most q{s,n) either 
does not cover any positive examples of /', or the projection is not consistent with a monomial. 
Every function (or partial function) that is not consistent with a monomial has a certificate of size 
3 certifying that it has that property, consisting of two positive examples of the function, and a 
negative example that is between them (cf. [I3]). For assignments r,x,y E {0, 1}", we say that r is 
between x and y if Vz, pi = ri or qi = ri. It follows that if is not consistent with a monomial, 
then /' has a certificate c(T) of size 3 proving that fact, consisting of two positive examples of /' 
that satisfy T, and one negative example of /' satisfying T that is between them. 

Let T = {T\ term T is such that |T| < q{s,n) and is not consistent with a monomial}. Let 
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A = UTerc(^)- Clearly |^| < 3n'?(^'"). We claim that A is a certificate that dnf-size{f) > s. 
Suppose not. Then there exists a function g that is consistent with / on the assignments in A, 
such that dnf-size{g) < s. Consider the partial function h which is defined only on the assignments 
in A, and is consistent with g (and /) on those assignments. The partial function h does not 
have a seed of size at most q{s,n), because for all terms T of size at most q{s,n), either T does 
not cover a positive assignment of h, or A contains a certificate that is not consistent with a 
monomial. Since dnf-size{g) < s, and every DNF that is consistent with g is also consistent with 
h, dnf-size{h) < s also. Thus by the seed lemma for DNF, h has a seed of size at most q{s,n). 
Contradiction. 

□ 



D Proofs 

D.l Proof of Lemma [7] 

Proof of Lemma [3 The proof uses the following claim, which is established by a simple greedy 
argument: 

Claim 12 (Lemma 6 of [3]). Let (p he a. DNF formula with T > 1 terms such that each term 
contains at least an distinct unnegated variables, where < a < 1. Then there is a nonemptjH set 
V of at most 1 + [logf, TJ variables such that each term of (p contains a positive occurrence of some 
variable in V, where 5 = 1/(1 — a). 

Let / be a T-term DNF formula. Since by assumption we have T > 1, there is at least one 
term in / and hence at least one positive assignment y for /. If r > ^/n then clearly this positive 
assignment y has \y\ < T^Jn^ so the lemma holds for r > ^Jn. Thus we may henceforth assume that 
r < i/n. 

Let a = (note that < a < 1 as required by Claim [T2]) . If there is some term of / with 

fewer than an = r^/n distinct unnegated variables, then we can obtain a positive assignment y for 
/ with \y\ < r^Jn by setting exactly those variables to 1 which are unnegated in this term and 
setting all other variables to 0. So we may suppose that every term of / has at least an distinct 
unnegated variables. Claim [T2] now implies that there is a nonempty set V of at most 

l+Llogi/(i_,/^)Tj<l + ^lnr 

variables V such that each term of / contains a positive occurrence of some variable in V . The 
assignment z which sets all and only the variables in 1/ to is a negative assignment with n > 
\z\ > n — (-y/n In T)/r — 1 (note that n > \z\ because V is nonempty), and Lemma[7]is proved. □ 



D.2 Proof of Claim [TOl 

Proof. Let functions fii,---,fit be drawn independently and uniformly from CON. (Note that 
t > Ihy the bound k < yj^^-) We show that with nonzero probability the resulting list of functions 
has the claimed property. 

Fix any a £ Z. The probability that Maj(/ij , . . . , /jj disagrees with Maj(/i, . . . , /at) on a is 
easily seen to be at most 

\t/2) ^ n'=*/2 ■ 

^We stress that V is nonempty because this will be useful for us later. 
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Recalling that t = i^^"^^ , this is less than 1/2" for all 1 < A; < n. Since there are at most 2" 
assignments a in Z, a union bound over all o E Z gives that with nonzero probability (over 
the random draw of /j^, . . . , fij the function Maj(/ii, • • • i fk) agrees with Maj(/i, . . . , /at) on all 
assignments in Z as claimed. □ 



D.3 Proof of Theorem [n] 

Proof of Theorem lilt Let M{n, t, s) be the class of all monotone DNF over n variables with 
exactly t distinct terms each of length exactly s. Fix s and t both to be logn. We will show that 
given any DNF formula with at most n'°s" terms, there is an assignment such that at most a 
l^^(iog")/3 fraction of the DNFs in M{n,t,s) agree with / on that assignment; this implies the 
theorem by the arguments of Theorem [6l 

Let / be any DNF formula with at most T = n^°^"' terms. Applying Lemma [T| to / with r = 1, 
we may conclude that either there is an assignment y with \y\ < ^/n and f{y) = 1, or there is an 
assignment z with n > \z\ > n — ^/n(logn)'^ and f{z) = 0. 

Let be a DNF formula randomly and uniformly selected from M{n,t,s). All probabilities 
below refer to this draw of (p from M(n, t, s). 

We first suppose that there is an assignment y with f(y) = 1 and \y\ < \/n. The probability 
that any fixed term of <p (an AND of s randomly chosen variables) is satisfied by y is exactly 

ft') < (f) <M 1 



A union bound gives that VT^\4>{y) = 1] < t • < „(iog\)/3 ■ So in this case y is an assignment 

such that at most a ^(iog^„)/3 fraction of formulas in M(n, t, s) agree with 4> on y. 

Next we suppose that there is an assignment z with /(z) = and and n>\z\> n — ■^/n(log n)'^. 
Since s = t = logn and and \z\ < n — 1, we have that t < (") — ('g') as required by Fact [HI Applying 
Fact El we get that 



Pr[(/.(z) = 0] < 



n — -^^(log n)^ — log n 



lognX log" 



< 



n — 2-^/n(logn 



n 

lognX log" 



n 



1 _ ( 1 _ n) 



2\ logn^ 



n 



log n 



^ ,^_,^_2(logn)3^V°'" 



m 

'n^i \3\ log" / 1 \ logn -, 

2(logn)'^\ ^ / 1 \ 1 



n \n 



1/3/ n(log")/3' 



So in this case z is an assignment such that at most a Xjn^^^'^^l'^ fraction of formulas in M{n, t, s) 
agree with (p on z. This concludes the proof of Theorem [TTJ □ 
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